How Good Are Association-Rule Mining Algorithms?
نویسندگان
چکیده
We address the question of how much space remains for performance improvement over current association rule mining algorithms. Our approach is to compare their performance against an “Oracle algorithm” that knows in advance the identities of all frequent itemsets in the database and only needs to gather the actual supports of these itemsets, in one scan over the database, to complete the mining process. Clearly, any practical algorithm will have to do at least this much work in order to generate mining rules. While the notion of the Oracle is conceptually simple, its construction is not equally straightforward. In particular, it is critically dependent on the choice of data structures and database organizations used during the counting process. We present a carefully engineered implementation of Oracle that makes the best choices for these design parameters at each stage of the counting process. We also present a new mining algorithm, called ARMOR (Association Rule Mining based on ORacle), whose structure is derived by making minimal changes to Oracle, and is guaranteed to complete in two passes over the database. This is in marked contrast to the earlier approaches which designed new algorithms by trying to address the limitations of previous online algorithms. Although ARMOR is derived from Oracle, it shares the positive features of a variety of previous algorithms such as PARTITION, CARMA, AS-CPA, VIPER and DELTA. Our empirical study shows that ARMOR consistently performs within a factor of two of Oracle, over both real and synthetic databases.
منابع مشابه
New Approaches to Analyze Gasoline Rationing
In this paper, the relation among factors in the road transportation sector from March, 2005 to March, 2011 is analyzed. Most of the previous studies have economical point of view on gasoline consumption. Here, a new approach is proposed in which different data mining techniques are used to extract meaningful relations between the aforementioned factors. The main and dependent factor is gasolin...
متن کاملNumeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm
Abstract as a single objective one. Measures like support, confidence and other interestingness criteria which are used for evaluating a rule, can be thought of as different objectives of association rule mining problem. Support count is the number of records, which satisfies all the conditions that exist in the rule. This objective represents the accuracy of the rules extracted from the da...
متن کاملIntroducing an algorithm for use to hide sensitive association rules through perturb technique
Due to the rapid growth of data mining technology, obtaining private data on users through this technology becomes easier. Association Rules Mining is one of the data mining techniques to extract useful patterns in the form of association rules. One of the main problems in applying this technique on databases is the disclosure of sensitive data by endangering security and privacy. Hiding the as...
متن کاملOptimizing Membership Functions using Learning Automata for Fuzzy Association Rule Mining
The Transactions in web data often consist of quantitative data, suggesting that fuzzy set theory can be used to represent such data. The time spent by users on each web page is one type of web data, was regarded as a trapezoidal membership function (TMF) and can be used to evaluate user browsing behavior. The quality of mining fuzzy association rules depends on membership functions and since t...
متن کاملA Comparative Study of Association Rule Mining Algorithms on Grid and Cloud Platform
Association rule mining is a time consuming process due to involving both data intensive and computation intensive nature. In order to mine large volume of data and to enhance the scalability and performance of existing sequential association rule mining algorithms, parallel and distributed algorithms are developed. These traditional parallel and distributed algorithms are based on homogeneous ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002